Introduction

In this ACM CHI 2018 study, ā€œGender-Inclusive Design Sense of Belonging and Bias in Web Interfacesā€, the authors examine the perception of introductory Computer Science course sites, specifically the sense of community belonging by young women when shown websites coded as neutral and masculine (between subjects).

Participants review one of two course sites (either neutral or masculine conditions) and take a survey about the website afterwards, which includes both multiple choice and open response questions. The survey proper is preceded by verification questions to ensure that participants have reviewed the site, and concludes with a demographic survey.

The original paper and the repository

A link to the Qualtrics Survey is Here

Methods

Power Analysis

The original paper reports an effect size for the ambient belonging measure as d = .62. Given a one-tailed independent means t-test, sample sizes to achieve 80%, 90%, and 95% power would be 66, 91, and 114 total participants, respectively. Fewer than 1% of participants were excluded in the original study.

Planned Sample

We planned to recruit 111(subject to power analysis) participants using Amazon Mechanical Turk using the same criteria as the original paper: age between 18 and 25 years of age and from the United States.

Materials

We reproduced the materials as described in the original paper and with materials provided by the original authors. The two webpages hosted will have the same material, but will be rehosted on a new domain. The original experiment features two websites (one neutral and one masculine): (Image and Caption from Original Paper) The banner and other content of the gender-neutral interface used nature imagery (left), while the masculine interface included Star Trek imagery and styling evocative of a computer terminal (right). (Image Taken from Original Paper)

Participants were asked a total of 22 questions across six measures of interest on 1-7 scales (1: ā€œnot at all, 7:ā€extremely"), which were 1. Enrollment Intention, 2. Ambient Belonging, 3. Anticipated Success, 4. Self Confidence, 5. Future CS Study Intentions, 6. Gender-related Anxiety.

Participants were also asked demographic questions about gender, age, race, and education afterwards.

Procedure

Participants are initially asked 3 text-based responses as an initial attention check and a fourth question about the author of the website ā€œto probe for suspicion about the study.ā€

From the original study: >Participants were asked to review one of the two course web- pages and answer survey questions about six main measures regarding their sense of ambient belonging, perceptions of the class and the discipline of computer-science, and gender- related anxiety. To prevent bias, participants were told that the study was intended to ā€œlearn more about young people’s attitudes towards studying Computer Science.ā€ After agreeing to participate, participants were redirected to a Qualtrics survey, which randomly assigned them to review one of the two course pages.

A coded open-ended response question was also asked ā€œWould you take this class? Why or why not?ā€ in the original study. This question will be asked in order to perform the replication as closely as possible but will not be analysed.

Otherwise, the procedure will be followed extremely closely, with the exception of the URLs for the two websites will be different, as they will be newly hosted by the replication author.

Analysis Plan

As per the original study, responses will be excluded based on a failure of the original attention check questions.

The responses will be analysed for statistical significance across the six measures (means of related questions) across women in the masculine condition in contrast with women in the neutral condition in addition to women in the masculine condition compared to all other groups. We use a t-test to indicate statistical significance between the two groups.

Clarify key analysis of interest here The key analysis are the measures for ambient belonging and gender-related anxiety of women in the masculine condition compared to all other groups. Statistically significant difference between the two groups is expressed in a t-test.

Differences from Original Study

The only differences in the study are the absence of the open-response question in the analysis (but not the survey itself) and the different URLs of the websites. We do not predict that the different URL will have a significant effect, as participants will access the website via the same hyperlinked text. The original URLs were cs.stanford.edu/cs106a/(home/course), while the new URLs will be stanford.edu/~erawn/cs106a/(home/course). ## Results

Data preparation

Participants will be excluded based on the attention check questions. The mean score for each target measure will be calculated from the relevant questions.

###Data Preparation

####Load Relevant Libraries and Functions
library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stringr)
library(ggplot2)
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
####Import data
data = read.csv("data/test5.csv")
#data %>% row_to_names(row_number=2)
#### Data exclusion / filtering
data_filtered = data[-1:-2,]
data_filtered = data_filtered %>% filter(Finished = TRUE)
data_questions = data_filtered %>%
  select(ResponseId,
        #starts_with("EI"),
         starts_with("AB"),
         starts_with("AS"),
         starts_with("FN"),
         starts_with("CS"),
         starts_with("PC"),
         starts_with("LT"),
         starts_with("SN"),
         starts_with("MF"),
         starts_with("GS"),
         text_condition,
         starts_with("Q18"))

#### Prepare data for analysis - create columns etc.
data_longer = data_questions %>% pivot_longer(cols=-c("Q18", "text_condition", "ResponseId"),
                             names_to = 'Question',
                             values_to = 'Value')
data_longer = data_longer %>% 
  mutate(
    Category = str_extract(Question, "[A-Z]+")
  )
data_summary = data_longer %>%
  group_by(ResponseId, Category, Q18, text_condition) %>%
  summarize(MeanQuestion = mean(as.numeric(Value), na.rm=T))
## `summarise()` regrouping output by 'ResponseId', 'Category', 'Q18' (override with `.groups` argument)
data_summary
## # A tibble: 54 x 5
## # Groups:   ResponseId, Category, Q18 [54]
##    ResponseId        Category Q18   text_condition MeanQuestion
##    <chr>             <chr>    <chr> <chr>                 <dbl>
##  1 R_1jD2CpHEpN4QwQ4 AB       1     1                      4   
##  2 R_1jD2CpHEpN4QwQ4 AS       1     1                      4   
##  3 R_1jD2CpHEpN4QwQ4 CS       1     1                      4   
##  4 R_1jD2CpHEpN4QwQ4 FN       1     1                      4   
##  5 R_1jD2CpHEpN4QwQ4 GS       1     1                      4   
##  6 R_1jD2CpHEpN4QwQ4 LT       1     1                      4   
##  7 R_1jD2CpHEpN4QwQ4 MF       1     1                      4   
##  8 R_1jD2CpHEpN4QwQ4 PC       1     1                      4   
##  9 R_1jD2CpHEpN4QwQ4 SN       1     1                      4   
## 10 R_2UWRk0hvqyc5yaJ AB       1     2                      3.75
## # … with 44 more rows

Confirmatory analysis

As specified, we calculate the mean score for each target measure than run a t-test comparing women to men for each condition. Then, run a second analysis comparing women in the masculine condition to all other groups.

results = data_summary %>%
  group_by(Category, Q18, text_condition) %>%
  summarize(TotalMeanQuestion = mean(MeanQuestion, na.rm=T))
## `summarise()` regrouping output by 'Category', 'Q18' (override with `.groups` argument)
ab_male_1 = data_summary %>%
  filter(Q18 == "1", Category == "AB",text_condition == 1) 
ab_male_2 = data_summary %>%
  filter(Q18 == "1", Category == "AB",text_condition == 2) 
ab_female_1 = data_summary %>%
  filter(Q18 == "2", Category == "AB",text_condition == 1) 
ab_female_2 = data_summary %>%
  filter(Q18 == "2", Category == "AB",text_condition == 2) 

t.test(ab_male_1["MeanQuestion"],ab_male_2["MeanQuestion"])
## 
##  Welch Two Sample t-test
## 
## data:  ab_male_1["MeanQuestion"] and ab_male_2["MeanQuestion"]
## t = 1, df = 1, p-value = 0.5
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.463276  1.713276
## sample estimates:
## mean of x mean of y 
##     4.000     3.875

Overview of Results(Taken from the Original Paper)

Side-by-side graph with original graph is ideal here

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.